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Abstract — For single-carrier systems with frequency domain 
equalization, decision feedback equalization (DFE) performs bet- 
ter than linear equalization and has much lower computational 
complexity than sequence maximum likelihood detection. The 
main challenge in DFE is the feedback symbol selection rule. 
In this paper, we give a theoretical framework for a simple, 
sparsity based thresholding algorithm. We feed back multiple 
symbols in each iteration, so the algorithm converges fast and has 
a low computational cost. We show how the initial solution can 
be obtained via convex relaxation instead of linear equalization, 
and illustrate the impact that the choice of the initial solution 
has on the bit error rate performance of our algorithm. The 
algorithm is applicable in several existing wireless communication 
systems (SC-FDMA, MC-CDMA, MIMO-OFDM). Numerical 
results illustrate significant performance improvement in terms 
of bit error rate compared to the MMSE solution. 



I. Introduction 

In broadband, high data-rate, wireless communication sys- 
tems, the effect of multipath propagation can be severe. While 
orthogonal frequency division multiplexing (OFDM) success- 
fully deals with multipath, it is a multicarrier modulation that 
suffers from a large peak to average power ratio (PAPR). On 
the other hand, a more traditional single carrier modulation 
with time domain equalization approach is unattractive, due 
to the high complexity of the receiver and required signal 
processing time. When single carrier modulation is used in 
combination with frequency domain equalization, one attempts 
to approach the performance and complexity of OFDM, while 
maintaining a lower PAPR compared to OFDM HJ. 

Single carrier frequency division multiple access (SC- 
FDMA), is a single carrier technique that has lately received 
much attention as an alternative to orthogonal frequency 
division multiple access for 4G technology. SC-FDMA has 
been adopted for uplink transmission technique in both 3GPP 
Long Term Evolution (LTE) and LTE Advanced standards |2|. 
Since most of the cost in communication terminals comes from 
the power amplifier, a lower PAPR can significantly reduce the 
cost of mobile units. This results in a more power efficient and 
less complex mobile terminals. Since the orthogonal frequency 
division multiple access (OFDMA) is used in the downlink, 
both the burdens of complex frequency domain equalizer 
needed for the SC-FDMA and accommodating large PAPR 
in OFDMA rest upon the base station. 

Frequency domain equalization includes frequency domain 
linear equalization, decision feedback equalization and turbo 
equalization |3|. For frequency selective channels, decision 
feedback equalization (DFE) gives much better performance 
than linear equalization and has a lower complexity and com- 
putational cost than optimum equalizers and turbo equalizers. 
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The basic idea behind the DFE is to subtract (feed back) 
correctly equalized symbols in order to reduce the interference 
for the currently equalized symbols. If the wrong symbols 
are fed back, the interference will be further increased, so 
choosing which symbols are correct and should be fed back 
is a crucial step for any decision feedback algorithm. Existing 
DFE algorithms are mostly based on finding the minimum 
mean square error solution (MMSE) solution of the system, 
and then forming some metric (such as covariance matrix, or 
mean square error matrix), associated with that solution. The 
element of the solution that corresponds to the minimum of 
that metric is assumed to be the one that is most likely correct, 
and it is fed back. The equalizer is usually implemented using 
a frequency domain feed-forward and time domain feed-back 
filter, such as in dU and lUl . Vertical Bell Labs Layered Space 
Time (V-Blast), f6l fl], has been proposed as receiver archi- 
tecture for MIMO systems and can be viewed as a generalized 
decision feedback equalizer IB]. The drawback is that only one 
symbol is fed back in each iteration, so the complexity is linear 
in the block length. Even if multiple symbols are fed back, 
there is no general or systematic rule on how many symbols 
should be fed back, the number is fixed in each iteration. In this 
paper we address these issues with an adaptive thresholding 
rule for feedback symbol selection. Motivated by recent work 
in sparse recovery and compressive sensing |9|, our algorithm 
gives a theoretical framework, based on sparsity, for multiple 
symbol feedback selection. Our algorithm converges in very 
few iterations and its performance substantially improves upon 
MMSE equalization. We note here that a similar concept, 
successive interference cancellation, exists in multiple access 
schemes, where users cause interference for each other. This is 
especially a challenge in cases, such as code division multiple 
access (CDMA) when there is no strict time or frequency 
orthogonality between different users |10l, (Tl\. 

The rest of the paper is organized as follows. In section |ll] 
we give the problem statement. In section III we will present 
two ways of obtaining an initial solution for our algorithm and 
make the connection between sparsity of the error signal and 
the optimal thresholding rule for the DFE. Furthermore, we 
will introduce an adaptive thresholding algorithm. Section IV 
is devoted to numerical results. Finally, in section [V] we will 
give our concluding remarks and discussion of open problems. 



II. Problem Statement 



A. SC-FDMA 



While the decision feedback algorithm presented in this 
paper can be applied to several different technologies, such 
as MC-CDMA, MIMO OFDM, in this paper we focus on SC- 
FDMA. We will describe the SC-FDMA system model, and 
then explain how this model can be extended to other systems. 
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Fig. 1. Transmitter and Receiver Model for SC-FDMA 



Figure [T] depicts the high level model of an SC-FDMA 
receiver and transmitter, m modulated source symbols are 
converted to frequency domain. The frequency domain sym- 
bols are then mapped onto m out of n (m < n) possible 
orthogonal subcarriers. Subcarriers can be mapped in two 
ways: localized mapping, where each user is assigned a 
set of m consecutive subcarriers, and distributed mapping, 
where subcarriers assigned to the user are equally spaced 
across the entire channel bandwidth. After converting the 
symbols back to the time domain using an n-point IDFT 
and inserting the cyclic prefix, the SC-FDMA time domain 
symbol is transmitted through the channel. At the receiver all 
the steps are reversed. The crucial difference between the SC- 
FDMA and OFDMA comes from the additional DFT block 
before subcarrier mapping (shaded in the figure). The DFT 
block "spreads" the modulated source symbols, so that each 
subcarrier in frequency domain contains information about all 
the source symbols. While this has an advantage of multipath 
diversity, it also destroys the decoupling of the source symbols, 
since we no longer have one-to-one mapping between the 
source symbols and subcaiTiers. The result is that, unlike in 
OFDM, simple, one-tap equalization combined with symbol- 
by-symbol detection is not equivalent to maximum likelihood 
detection (MLD). In fact, the complexity of MLD for SC- 
FDMA grows exponentially with the block size, m, making 
it unsuitable for practical purposes. Sphere decoding can be 
successfully implemented with lower complexity than MLD, 
however, for large block sizes, rn, the complexity is still too 
high. 

It is convenient to consider a matrix formulation of an SC- 
FDMA system. In particular, for one user, the received vector, 
Y S C™ in time domain, (see e.g. equation (11) of 1,5J) is 
given by 

Y = F-\FH'F-^)Fx + lu, (1) 

where F is an m x m DFT matrix, H' £ C™ is a circulant 
channel matrix, x e C™ is a vector of modulated source 
symbols, and w G C™ additive white Gaussian noise (AWGN) 



. Since we are interested in frequency domain equalization, 
from Q we can get the following 



y = HFx + uj, 



(2) 



where y G C" is a received vector for one user in frequency 
domain and H £ is the diagonalized channel matrix. 

We assume that the channel is Rayleigh fading, and that the 
rows of H are normalized. Defining A — HF, our system 
becomes 

y = Ax + uj. (3) 

From Q it is easy to see that by substituting matrix F in (|2]) 
with any unitary "spreading" matrix U , such as a Hadamard, 
Haar or random Gaussian matrix, we get a more general 
model. The choice of U depends on the particular system 
being modeled. We also note that in this paper we assume 
that the receiver knows both the channel matrix H and the 
spreading matrix U . While we assumed for convenience that 
A is a square matrix, we emphasize that all results in our paper 
can be easily extended to the case of tall matrices A. 

Ideally, we would like to find the maximum likelihood (ML) 
solution of ([3]l, given b}{^ 



3xgvam\\y - 



Ax\ 



(4) 



where S™ is the space of all vectors of length m whose 
elements are picked from a given constellation § (e.g., for 
BPSK we have § = {-1, +1}). As mentioned above, the ML 
solution is optimal, but the complexity of solving Q grows 
exponentially with m, and therefore it cannot be used for 
practical purposes even for small m. While sphere decoding 
reduces the computational complexity of ML considerably, it 
is still too costly for moderate or large to. 

In the literature, the terms equalization and detection are 
often (mistakenly) used interchangeably, but in our case it 

'Tlie 2-norm of vector a of length n is denoted by ||a|| = ^/y^7— i 
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is really important to distinguish between the two. Equal- 
ization refers to operations done on the observation vector 
y in order to obtain the estimate of the transmitted vector 
(such as minimum mean square error equalization, or least 
squares equalization). However, at this stage, the estimate still 
contains the "soft" information, and not the actual symbols 
from the used constellation. The mapping of the estimate 
into the symbols of the used constellation (such as BPSK, or 
QPSK) is detection. The point of equalization is to allow for 
a simple coefficient-by-coefficient detection of the equalized 
vector instead of the computationally so expensive sequence 
detection done in Q (for ML there is of course no need for 
equalization, as we immediately obtain the detected solution). 
In this paper, we feed back the detected symbols, and not 
the soft information, so from here on, when we talk about 
obtaining and feeding back the initial solution, we are referring 
to the detected symbols. 

B. Decision Feedback Equalization 

To explain the idea behind the decision feedback equaliza- 
tion, let us assume that we want to equalize the Z*'' symbol in 
vector X. We can rewrite y as 

y = A^xi+^A^x.+u, 
ieL 

where L = {z e Z | < i < n - 1, i ^ Z} and A' 
denotes the column of matrix A. The first term in the 
last equation is simply the symbol we want to equalize, xi, 
scaled by the channel. The summation term, / = X^iei -^^^i^ 
at least as far as equalization of x; is concerned, is viewed as 
interference. The hope is that if we have previously correctly 
equalized and detected some of the Xi i G P, where P C L, 
we can use that knowledge to reconstruct Ip — J2ieP 
and subtract it from y. In this way, we are subtracting the 
contributions of interference from our observation. Basically, 
for the purpose of equalization of xi, the interference is 
reduced, which gives us a better chance of recovering xi 
correctly. In the subsequent iterations, we will have a reduced 
system, since we will omit the columns of A that correspond 
to the index set of correctly equalized symbols in the previous 
iteration. So our system for all iterations k > will be 
overdetermined, which increases our likelihood of recovering 
correctly the remaining symbols. 

While this concept sounds very nice in theory, in practice we 
face a very difficult question: how do we know which symbols 
are equalized correctly and should be fed back? Unfortunately, 
there is no way to ensure that we are feeding back the correct 
symbols. It is even more unfortunate that if we feed back 
the wrong symbols, we further increase the interference and 
cause error propagation. Obviously, the performance of any 
DFE algorithm is determined by the selection rule of the 
feedback symbols. The other question that arises is how many 
symbols should we feed back in each iteration. While feeding 
back one symbol at a time, as is done in V-BLAST |6|, may 
seem like the safest option, the computational time that it 
requires for larger block sizes, m, might be unacceptable for 
some applications. Also, in a good signal to noise ratio (SNR) 



situation, the majority of the symbols would most likely be 
correct, so feeding back one symbol at a time would be a waste 
of resources. Hence there is a tradeoff: from the performance 
point of view, we would rather feed back fewer symbols, that 
are guaranteed to be correct, while from a computational point 
of view we want to feed back as many symbols as possible in 
each iteration, in order to have fewer iterations. 

Let us assume for the moment that x is known at the 
receiver Then we would be able to compute the error signal 
given by 

e = X — i, (5) 

where x is the estimate of x obtained at the receiver af- 
ter equalization and detection. Note that for each Xi, i — 
0, m — 1 that matches Xi, the corresponding entry in vector 
Ci would be 0. So, assuming that we did a decent job of 
estimating x, then e is a sparse vector, where the locations 
of the non-zero entries of e correspond to the locations of 
errors we made in our estimate of x. One realization of e is 



shown in Figure 2(a) We can immediately see that knowing 



this error vector would be ideal for our DFE selection rule: 
if we knew the locations of errors, we would simply not feed 
back the symbols that correspond to them, while we could 
safely feed back all symbols whose entries correspond to the 
zero entries of e. 
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(a) Absolute value of the trae error signal in the first iteration, |e| 




(b) Absolute value of the estimated error signal in the first iteration. 
Fig. 2. Comparison of the true and estimated error signals 
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Unfortunately, a true solution for x is not known at the 
receiver, so we cannot construct the error signal e given 
by Q. We can try to obtain an estimate e of e, and use 
this information for our feedback selection rule. One such 



estimate is shown in Figure 2(b) We can see that the largest 



peaks in Figure 2(b) correspond to the locations of errors in 
Figure |2(a)| However, there are a lot of small peaks that come 
from the noise, and our goal is to come up with a threshold 
rule that will be able to distinguish the "true" peaks in the 
estimated error signal from the noise. Also, as we reduce the 
interference in the subsequent iterations the error signal will 
look differently, which means that the chosen threshold should 
adapt appropriately. 

From our previous discussion we can see that in order to 
design an efficient decision feedback equalization algorithm 
that utilizes iterative adaptive thresholding of the error signal, 
we need to provide answers to the following crucial questions: 

1) How do we find the initial solution, xl 

2) How do we obtain the error estimate e? 

3) How do we design a threshold that will separate true 
peaks from the noise, and adapt to the en^or signal in 
each iteration? 



Clearly, convS™ = (convS)™. For 
convS ^ {x E C : max{|3ff{a::}| 
that case ^ can be expressed as 



instance for QPSK 
< 1}. Thus in 



\Ax- 



2/112 



s.t 



\mx}\\oo < 1, \mx}\\ 



< 



10 

(7) 

Some theoretical results for the noise-free, underdetermined 
setting and the special case § — {±1} can be found in I.13J , 
lfT4ll . However, in our case the issue is not underdetermined- 
ness, but noise. Therefore the results in the aforementioned 
papers have little bearing on our situation. 

We note here that while the solution obtained via (|6| leads 
to a better performance than MMSE (as we will show in 
section IV i the computational cost of solving (|7]i is higher. 
Nevertheless, due to recent progress in convex optimization 
(partly driven by the thriving area of compressive sensing) 
we have now a number of fast algorithms for the solution of 
problems like (|6|. 

Remark: Because of the noise, the solution we obtain by 
solving ^ or ^ will not necessarily be from a finite alphabet 
of our constellation. So in order to obtain our x we still have 
to perform symbol by symbol detection step as discussed in 
the previous section. The same is true for xzp or a;MMSE- 



III. Successive Interference Cancellation with 
Adaptive Thresholding 

A. Initial Solution via Linear Equalization 

In a decision feedback algorithm, in each iteration, we first 
must obtain an initial solution that will be used to determine 
which symbols are correctly equalized and should be fed back. 
Obviously, a solution closer to the actual transmitted vector 
will give more accurate information for our decision feedback 
rule, so obtaining a good estimate of x in each iteration 
obviously has an impact on the performance of our algorithm. 

The simplest way to obtain x is using zero forcing (ZF) 

xzF^A*{AA*r^y, 
or an MMSE solution 

a^MMSE = A*{AA* +a^I)-^y. 

For instance for MMSE, x is now obtained from ojmmse by 
projecting each coefficient of Xmmse onto S. 

Unfortunately large noise enhancement severely degrades 
the performance of ZF. MMSE offers better performance than 
ZF, but the ISI is still present ||5J. 

B. Initial Solution via Convex Relaxation 

From a computational viewpoint the problem with the 
optimization problem Q is that we need to find the minimum 
over a non-convex set, the symbol space §™. A natural idea 
is then to consider a convex relaxation of Q by replacing S 
by its convex hull convS (for a definition of a convex hull 
see 11121 ). Thus instead of Q we are concerned with 

x= argmin 1 1 y — Aa; I p . (6) 



C. Error Signal And Adaptive Thresholding 

In the area of compressed sensing greedy algorithms have 
been successfully used in finding the sparsest solution for 
large, underdetermined systems. While the recovery of our 
error signal does not fall into the category of a large un- 
derdetermined system with a sparse solution, our approach 
is inspired by the Stagewise Orthogonal Matching Pursuit 
(StOMP), an iterative thresholding algorithm for finding sparse 
solutions of |9|. We use a similar idea for determining which 
symbols in our current solution are correct and should be fed 
back in order to reduce the interference for the next iteration. 

Let us assume that in the fc*'* iteration we have obtained 
Xk- Then we can form the corresponding residual, rfc, as 

Tk^Vk- AkXk, (8) 

where Ak denotes the matrix that is obtained from matrix A 
by leaving out the columns that correspond to the index set 
of correctly equalized symbols in each previous iteration (the 
number of rows of Ak is still m, but the number of columns 
gets smaller in each iteration). Then the estimate of e in the 
fc-th iteration is given by 

efe = Alrk. (9) 

The key observation is that the vector e can be viewed as a 
sparse, spiky signal embedded in noise, and therefore we can 
represent it as 

efc = Cfe + Zfc, (10) 

where Zk is the noise term in the k-th iteration. We will later 
show that under certain conditions z is approximately additive 
white Gaussian noise (AWGN). 

-The infinity-norm of vector a of length n is given by ||a||oo = 
max{|ai|, |a,i|}. 
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Now that we were able to obtain an estimate of e we need to 
come up with a threshold which will help us determine which 
entries in e are small enough to be considered just noise (no 
error was made for that index) and thus should be fed back. 

It is a well known result, that the maximum of a random 
Gaussian sequence, c G C™, Ck ~ CA/'(0,ct^), is bounded 
by El 

max (|c|) < v/2cr21ogm, fc = 0, ...m-1 (11) 

with high probability. So if we had an unknown "spiky" 
function embedded in AWGN, ([TTJ would be a natural choice 
for the threshold that would distinguish between the spikes 
and the noise: we could assume with very high probability 
that everything that is below ( [TT| ) is indeed just noise and 
not a "true" spike. In [16 ], the authors use ( fTT) to obtain an 
optimal threshold rule for recovering a sparse signal embedded 
in AWGN noise that adapts to the level of sparsity. They mod- 
ify ( fTTj ) by exploiting the number of spikes (level of sparsity) 
of the function that they are thresholding. In particular, their 
proposed threshold is given by 

V2(l-/3)logTO, < /? < 1, (12) 

where p = is the level of sparsity, and am the variance of 
the noise term. Via a simple calculation ( [T2j l can be expressed 
as 

tp = CT„\/2 logm/p, (13) 

which is more convenient for our purposes. The threshold 
depends on logm/p, rather than just logm, and the penalty 
factor of logm//9 accounts for the number of spikes that we 
are expecting. So the more spikes we have (the less sparse the 
signal is), the lower the threshold gets. Clearly in case that 
the signal has only one spike, p —1, equation ( [T3| ) is reduced 
to 

We emphasize here that our objective is different from the 
one in |l6i or 0: we are not interested in recovering the 
amplitudes of non-zero elements (spikes) of the error signal 
as it is the case in the compressed sensing applications. We 
are only interested in the positions that are zero, or very 
close to zero since those are the entries that we need to 
feed back to reduce the interference. In other words, we 
are only interested in locations of entries that are below the 
threshold. Furthermore, we point out that in our case, if we 
"miss" some zero locations in a given iteration, we do not 
face a performance penalty, it just means that we might have 
more iterations. However, if we feed back a location that is 
actually a spike, we increase the interference and cause error 
propagation. In that sense, our problem is not symmetrical, 
so for our purposes, it is better to feed back fewer entries, 
(which corresponds to choosing a lower threshold), than to 
feed back the wrong entries. Obviously, the "safest" threshold 
rule would be to find the error estimate e, and feed back only 
the smallest entry of |e|, but then the number of iterations 
needed would be equal to the block length m. We will show 
in section |IV] that while feeding back one symbol per iteration 
does have a superior BER performance compared to our 
adaptive thresholding rule, the computational times are very 
high. 



In our case the threshold in the k iteration becomes 

ife = ^2\og{mk/pkW£\\\zuW^]. (14) 

From ( [T4| i we can see that we still need to obtain the level 
of sparsity, p, as well as the variance of the noise term Zk- 
The level of sparsity is determined by the number of errors 
we make in our solution. This number will be different in 
every iteration, so our threshold has to adapt appropriately. 
We obviously cannot know the number of errors, p, that 
occurred in our current solution, but we need to know at least 
approximately the level of sparsity of the actual error vector 
e. We can obtain this estimate in the k*-^ iteration as 

Pk = |kfe||Vsmin> (15) 

where Smin is the minimum distance among symbols for 
the used constellation. Note that the number of unknowns 
decreases from one iteration to the next, hence the length, 
mk of Cfc will also change in every iteration. 

The validity of using ([T4]i as an optimal threshold is based 
on the assumption that the noise z in ([TO]l is AWGN. The 
following theorem will show that this is indeed the case 
(asymptotically) at least in the first iteration, and therefore, 
using ([T4| is justified. 

Theorem 3.1: Let zq be defined as zq — io — eg where e*-"-* 
and e*^°^ are defined in ^ and (|5]), and cq has zero mean. 
Let matrix A from (|3]l be a square matrix (m — n). Then 
the entries of zq, Zi^, i — 0,...,n — 1 are asymptotically 
i.i.d. normally distributed with zero mean and variance of 
£[\\e\\^]/m + a^ 

Proof: For clarity of presentation, throughout this proof 
we will omit the iteration index, 0, but we emphasize that the 
proof applies only to the first (k = 0) iteration. 

We can write z in the following way: 

z = e — e 

= A*r — e 

= A*{y-Ax)-e 

= A*{Ax + - A*Ax ~ e 

= A* A{x - x) - e + A*uj 

= A*Ae~e + A*iL! 

= {A*A- I)e + A*uj (16) 

Since F (or in more general case U) is a unitary matrix, 
and the rows of the channel matrix are normalized to have 
unit energy on average, we have that {A*A)ii — 1 and we can 
write the i-th entry in z as 

ni 

z, = ^(A\A^)e,+A^*c^. (17) 

Using the Central Limit Theorem, both terms in (TT) will 
be normally distributed in the limit since H, U e, and w are 
uncorrected. As a sum of two normally distributed variables, 
Zi will also have a Normal distribution. The mean of Zi will 
be zero since both e and uj have zero mean. In order to find 



6 



the variance of Zi, f we first find the variance cr^ of 

the first term in ( fTT] ). We have that: 



fc=0 



(18) 



where A' is the i*^ row. A' is the j*'' column and aki are 
entries of matrix A, respectively. Since each entry, aki has a 
magnitude of Xj ^Jm on average, the variance of each entry is 
then 



1 



Using the Central Limit Theorem again, the variance of ( [181 ) 
is then 1/m. The variance of Cj is £'[||e|p]/TO by definition 
so we have that g\ is given by: 



(m-1) 



1 £\ 



m — 1 



(19) 



The second term in ( fTTj l is simply a sum of Gaussian random 
variables, so it remains Gaussian with zero-mean and variance 



So finally, we have that the variance of Zi is given by: 

e\\\e\?\ . o 



We have shown that the variance of the noise term in (flO^ 



(20) 



is 2 

m 

If there are errors in the estimated solution i, we can 
assume, especially for higher SNR values, that a is much 
smaller than the term that comes from the interference in ( |20] l, 
hence 

m 

Since i7 is normalized and V is unitary, there holds 
Furthermore we have 
and thus 

f[|lr,f]=f[|li/t/e||2]+a2 

Using the same assumption about a as before, we have the 
following approximation 



f[||e|P]«||rs|| 



(21) 



Substituting ( fT5| ) and ( pO] ) into ( fT3] ), we finally obtain our 
threshold in the /c-th iteration as 



IM 



(22) 



In Figure 3(b) we have shown the first iteration (k = 0) of 
our thresholding algorithm for x of length 128 and signal to 



noise ratio (SNR) of lOdB. In Figure 3(a) we can see that the 
actual error signal has fc = 4 non-zero values. Our estimate, 
obtained by ( [T5] l, in this case is fc w 5. The corresponding 



threshold, obtained by ( |22| ), is to « 0.6. Using this threshold, 
in the first iteration we feed back over 100 symbols, and 
they are all correct. This illustrates how our algorithm gives 
a systematic framework of choosing as many correct symbols 
as possible in each iteration. 
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(a) Absolute value of the true error signal in the first iteration, |e| 
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(b) Absolute value of the estimated error signal in the first iteration, 
|e| 

Fig. 3. Comparison of the true and estimated eiTor signals 

We emphasize that the result in the theorem is valid only in 
the first iteration of our thresholding algorithm. Once we start 
removing the interference for the subsequent iterations, the 
entries in z are no longer uncorrected. The result in Theorem 
1 is significant, because it allows us to find the optimal 
threshold in the first iteration. For all the following iterations, 
this threshold is no longer optimal, but our numerical results 
show that the majority of the indices are fed back in the first 
iteration. The chosen threshold gives satisfactory results for 
the other iterations too, even though it might not be optimal. 
In Figures |4] - 16] we have shown the quantile-quantile plots of 
the sample quantiles of Zk versus theoretical quantiles from 
a normal distribution, for fc — 0,1,2. Figure |4] illustrates 
clearly that in the first iteration zo is very close to being 
normally distributed. In Figures |5] and |6] we see that even 
though majority of the samples of zi coincide with the normal 
distribution there are a number of entries that deviate from the 
normal distribution. The latter observation suggests that there 
should be room for improvement to our thresholding strategy. 
We briefly return to this issue in our Conclusion. 
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Fig. 4. Quandle-quantile plots of the sample quantiles of zq versus theoretical 
quantiles from a normal distribution 
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Standard Normal Quantiles 

Fig, 5. Quantile-quantile plots of the sample quantiles of zi versus theoretical 
quantiles from a normal distribution 

D. Algorithm 

Now that we have laid out all the necessary pieces, we are 
ready to present our complete adaptive thresholding decision 
feedback algorithm. 

From the observed vector y we first obtain an initial estimate 
of the transmitted vector x using ZF, MMSE or convex 
optimization described in (|6]l, which we detect in order to 
obtain Xk- We find the residual as 

Tk^y - Axk, 

and obtain the estimate of the error signal as 

efe = Alrk. 

We calculate the threshold tk as 



tk = \/(21og(mfe/pfe) 




We threshold |efc| and obtain the index set, Ic^. = {i E 
Z| \si,k\ < tk}- The index set Ic^ contains the positions of 
all entries in the solution Xk that are assumed to be correct. We 
then remove the interference caused by the "correct" symbols: 

Vk+i =yk- Ak{:,IcjS:k{IcJ. 



lteration=2, SNR=4 
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Fig. 6. Quantile-quantile plots of the sample quantiles of 22 1 versus 
theoretical quantiles from a normal distribution 

Here, the notation A{:,Ic) denotes that all rows of A are 
selected, but only columns that correspond to index set Ic 
are selected. We form the matrix Ak^i to be used in the 
subsequent iteration to obtain Xk+i by leaving out all the 
columns of matrix Ak that correspond to index set Ic^ ■ Using 
yk+i and Ak+i we generate the new, smaller, initial solution 
and repeat the process until all indices from / ~ 0, ...m — 1 
are exhausted. 

E. Error detection via £i Minimization 

As is the case with the initial solution, the accuracy of our 
error signal estimate can also influence the performance of our 
algorithm. Since e is sparse we can attempt to approximate e 
by using €i -minimization as is meanwhile common practice. 

Let Xk, Tk be the solution and the residual in the k- 
th iteration, as defined in the previous subsection. Then the 
estimate of e in each iteration can be obtained by solving the 
following ^i-minimization problem: 

minplli s.t. \\Ae - r^f < na"^^ (23) 

The estimate e obtained is a good approximation of the actual 
error signal - at least in the high-SNR case, ^i-minimization 
has been tremendously successful in recovering sparse signals 
from underdetermined linear systems in the noise-free or high- 
SNR setting. However for the low-SNR case it is unfortunately 
much less effective, even though we are not dealing with an 
underdetermined system. In particular, if the noise is large 
enough such that < na^, then the optimal solution 

to ( [23] l is e = 0, which is not useful. Since na^ represents only 
the expected energy of the noise, using a more conservative 
choice in ( |23] l, such as e.g. can improve the result 

somewhat. Furthermore, even though the solution obtained 
via ( |23] l will be mostly sparse, there can be some small, 
non-zero entries due to the noise, so we would still need to 
apply some kind of threshold before feeding back. However, 
obtaining the error estimate solution via £i -minimization does 
not yield superior performance compared to using (|9|, as the 
numerical simulations clearly demonstrate in the next section. 
Obviously, one could try to obtain the error estimate using 

^The £i norm of vector a of length n is given by ||a||i = I'^d 
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greedy algorithms (orthogonal matching pursuit ifTTl . subspace 
pursuit (TFl) used in compressed sensing as a less costly 
alternative to £i optimization, however, they did not provide 
any performance improvement over £i minimization. 

IV. Simulation Results 

In this section we present our numerical results. We consider 
the model as given in (|3j. We used x with a length of 128 
symbols chosen from a QPSK constellation. The optimization 
toolbox CVX ff9l has been used for solving both (|6]l and the 
£i optimization problem. 

We simulated the bit error rate (BER) performance for the 
following cases: 

1) Standard linear equalizer, labeled as "MMSE" in the 
plot. 

2) Solution of Q ("inf"). 

3) Our adaptive thresholding algorithm, where the initial 
solution is obtained as an MMSE, and error estimate 
via (|9| ("MMSE+thresh"). 

4) Our adaptive thresholding algorithm where the initial 
solution is obtained via optimization problem ^ and 
error estimate using (|9]) ("inf + thresh"). 

5) Our adaptive thresholding algorithm, where the initial 
solution is obtained as an MMSE, and error estimate as 
using £i optimization ( |23] l ("^i opt +thresh"). 

6) Feeding back the smallest entry of |e| in each iteration 
("Feed back"). 

Figure ^ depicts the results of our simulation. The first 
comparison that we would like to point out is between 
obtaining the initial solution using MMSE and using (|7|. 
The performance of (|7]l is significantly better - around 3.5dB 
at BER levels of lO^'^, however, we emphasize again that 
finding x using MMSE has a significantly lower computational 
cost, especially when considering that an initial solution has 
to be found in each iteration. We can then compare all the 
thresholding scenarios. From the figure, we can see that using 
£i optimization to find the error estimate in combination with 
our adaptive thresholding has an inferior performance even 
compared with just finding an initial solution via (|7|. The 
adaptive thresholding with MMSE has a 4dB gain compared to 
using just MMSE at BER levels of 10^^. Adaptive threshold- 
ing with (|7]i has around 0.5dB improvement compared to using 
MMSE for an initial solution for BER= 10^'^. Finally, we can 
see that feeding back one coefficient at a time has the best 
performance, however, the drawback is that the number of nec- 
essary iterations is equal to the block length m. We note here 
that our adaptive thresholding algorithm usually converges 
within three iterations for low SNR scenarios, independently 
of the block size m. To illustrate the computational time 
difference between feeding back 1, our adaptive thresholding 
algorithm and standard MMSE, we have measured the time it 
took to run our simulation for 1000 QPSK symbols. Feeding 
back 1 took 9645 seconds, our thresholding algorithm took 
321 and standard MMSE took 144 seconds. So feeding back 
1 took around 30 times longer than adaptive thresholding, and 
around 66 times longer than MMSE. 

From the previous discussion, we can see that in addition 
to the superior performance compared to linear equalizers, our 
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Fig. 7. BER performance sparsity based thresholding with different ways 
of obtaining initial solution and error estimate and feeding back 1 entry at a 
time in case A = HU where H is a normalized Rayleigh fading diagonal 
matrix, and U is an DFT matrix 



algorithm is very versatile: depending on how we find the 
initial solution, the error estimate, we can choose to sacrifice 
some performance in terms of BER for faster convergence. 
Also, the threshold itself depends very little on the actual 
system, so it can be easily adapted for different applications. 
In addition, the algorithm is scalable, and can be easily be 
applied to larger block sizes, with same convergence rates and 
performance. This is illustrated in Figures |8] and |9] where we 
have shown the performance of our thresholding algorithm by 
using Hadamard and Haar matrices, respectively, instead of an 
DFT matrix in ([3]l. Our simulations show similar trends as for 
the DFT matrix - for BER=10^'^ we gain about 4dB for both 
Hadamard and Haar matrices, by using MMSE and adaptive 



thresholding, compared to just MMSE. In Figure 10 we have 



shown the performance of our algorithm with DFT matrix, and 
block length of 1024. The performance improvement does not 
change with increasing the block size, and the algorithm still 



converges within 3 iterations. Finally, in Figure 12 we show 
the performance of our algorithm when 16-QAM modulation 
is used. In this case our thresholding algorithm in combination 
with initial solution obtained via (|6| for BER ~ 10^^ 
has around lO.SdB improvement over MMSE. Unfortunately, 
poor MMSE performance has also significantly degraded the 
performance of our thresholding algorithm when the initial 
solution is obtained using MMSE. 

In Figure [TT| we have explored the performance of threshold 
given by (|ll|i and threshold given by ( |22] i in order to see how 
much we gain by exploiting the sparsity level of the error 
estimate. We can see from the figure that we gain around 
0.5dB at BER=10^'^ just by introducing the penalty factor of 
log m/p. 

A. Application To Large-System CDMA 

Here, we will briefly describe modifications needed to 
implement our algorithm for a large system code division 
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Fig. 10. BER performance compaiison between block length of 128 and 1024 
Fig. 8. BER performance comparison between MMSE, sparsity based "sing sparsity based thresholding in case A = HU where is a normalized 
thresholding and feeding back 1 entry at a time in case A = HU where Rayleigh fading diagonal matrix, and C/ is a DFT matrix 
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Fig. 9. BER performance comparison between MMSE, sparsity based 
thresholding and feeding back 1 entry at a time in case A = HU where 
H is Si normalized Rayleigh fading diagonal matrix, and U is Haar matrix 



Fig. 11. BER performance comparison of our adaptive thresholding algo- 
rithms for threshold oc log m and threshold oc log m/p 



multiple access and show the simulation results. We use the 
following system model for K user system with spreading 
factor of TV 120J; 

y = SPx + uj (24) 



In ( |24| l S represents the spreading matrix whose entries we 
choose from Gaussian distribution, with zero mean and unit 
variance. P is a diagonal matrix, P = diag(vT7, \/Tk, 
where Fj denotes the signal to interference ration of user i. 
y, X and ui are as defined in previous sections. From ( |24] l we 
can see that if there was no interference between users, P 
would become an identity matrix, and we would exactly get 
our model given in (|3]l. 



In Figure 13 we have illustrated the performance of our 
algorithm when applied to a CDMA system given by ( |24] i. 
Weuse N = K = 128 and QPSK modulation. Matrix S" is a 
random Gaussian matrix, and it is appropriately normalized. 
We assume perfect power control, so we have that Fi = ... = 
Fi = ... = Fif = F. Figure |T3] shows that MMSE in case U is 
random Gaussian matrix performs very poorly. That somewhat 
degrades the performance of our thresholding algorithm whith 
an MMSE initial solution, but at the BER level of 10^^ we 
still have an improvement of lOdB. When an initial solution 
obtained via (|6| with our thresholding algorithm, we get the 
same performace as when feeding back 1 symbol at time 
which gives us an improvement of almost 13dB over MMSE 
performance. 
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We illustrated the performance of our algorithm in numerical 
simulations, and our algorithm shows a significant perfor- 
mance improvement compared to linear equalizers, while the 
computational time is much lower compared to feeding back 
one symbol at a time. 

While the algorithm presented in this paper offers a dramat- 
ically improved BER performance over the linear equalizer, 
there is still room for improvement, especially in the low 
SNR region. Recently in the area of compressed sensing, 
adaptive message passing (AMP) algorithms, based on belief 
propagation, have been successfully used to improve the per- 
formance of iterative thresholding algorithms for sparse signal 
recovery II2T1I . AMP can successfully account for correlations 
in the data, which is certainly of importance in our setting. 
Unfortunately, we cannot simply apply the same approach, 
mostly because of the step of mapping the estimated initial 
solution to the constellation points. How to adapt the message 
^i-passing approach to our DFE problem is a topic of future 
research. 



Fig. 12. BER performance comparison for a block length of 128 and 16- 
QAM modulation in case A = HU where H is a normalized Rayleigh fading 
diagonal matrix, and 1/ is a DFT matrix 
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Fig. 13. BER performance comparison for a block length of 128 and 4- 
QAM modulation in case A = HU where H is a normalized Rayleigh 
fading diagonal matrix, and C/ is a random Gaussian matrix 



V. Conclusion 

In this paper we propose a new decision feedback equal- 
ization algorithm for SC-FDMA system. The algorithm is 
based on adaptive thresholding that exploits the sparsity of 
the estimated error signal. We provide a theoretical framework 
for multiple feedback symbol selection in each iteration which 
leads to a very fast convergence. Our algorithm has a low 
computational complexity, and even though the focus of our 
paper is on SC-FDMA, it can easily be applied for different 
existing technologies such as CDMA and MIMO OFDM. 
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